Question 1

First, data is read and required changes are applied to the data set. Then, since accelaration and velocity does not give much info about the shapes, I calculated the positions vectors of the gestures.

Figure below shows the position of Gesture 2. We can see the closed traingle-like shape from the right angle.

For different k levels(1,3,5), and different distance mesaures(euclidian and manhattan), Nearest Neighbor algorithm is applied on the data set.

The accuracy information for different measures can be seen below:

cvresult[,list(Accu=mean(Predictions==Real)),by=list(Distance,Klev)]
##     Distance Klev      Accu
## 1: Euclidian    1 0.9441964
## 2: Euclidian    3 0.9464286
## 3: Euclidian    5 0.9419643
## 4: Manhattan    1 0.9564732
## 5: Manhattan    3 0.8895089
## 6: Manhattan    5 0.8337054

For Euclidian distance measure, k=1 and 3 performed the same, for Manhattan k=1 seems the best option, whereas k=1 seems the best option with Manhattan distance measure.

Confusion matrix for Euclidian distance measure with k=3

ConfusionMatrix
##             Actual 1 Actual 2 Actual 3 Actual 4 Actual 5 Actual 6 Actual 7
## Predicted 1      431        0        0        2        0        4        0
## Predicted 2        1      449        0        0        0        0        2
## Predicted 3        2        0      413        0       15       20        4
## Predicted 4        7        0        0      370       60        6        0
## Predicted 5        3        0        6        1      422        1        0
## Predicted 6        6        0        7       15       28      392        1
## Predicted 7        0        0        1        0        0        0      446
## Predicted 8        0        0        0        1        1        0        0
##             Actual 8
## Predicted 1        0
## Predicted 2        0
## Predicted 3        0
## Predicted 4        7
## Predicted 5        0
## Predicted 6        0
## Predicted 7        0
## Predicted 8      458

Condusion matrix for Manhattan distance measure with k=1

ConfusionMatrix2
##             Actual 1 Actual 2 Actual 3 Actual 4 Actual 5 Actual 6 Actual 7
## Predicted 1      432        0        0        1        0        4        0
## Predicted 2        1      451        0        0        0        0        0
## Predicted 3        2        0      417        0       16       14        5
## Predicted 4        3        0        0      395       41        8        0
## Predicted 5        3        0        9        4      416        1        0
## Predicted 6        3        0        6       12       15      412        0
## Predicted 7        0        0        3        0        0        0      444
## Predicted 8        0        0        0        3        1        0        0
##             Actual 8
## Predicted 1        0
## Predicted 2        0
## Predicted 3        0
## Predicted 4        3
## Predicted 5        0
## Predicted 6        1
## Predicted 7        0
## Predicted 8      456

Accuracy of Manhattan distance seems larger than the accuracy of Euclidian distance. However, the execution time of Manhattan distance calculation takes longer than the Euclidian distance calculation for the same task. Different distance measure functions performing the same task are from different packages, therefore the runtime difference may be occurred from the package content. Both accuracy results seem good enough, since they are around 95%.

Question 2

In this question, ECG data is used to train a penalized logistic regression. I have used the training data to discover lambda and coefficient values, while doing it, I used 10 fold cross validation. Lambda value and other information can be seen below. The values are the best ones chosen from 10 different options.

CVFused$fullfit
## Penalized logistic regression object
## 96 regression coefficients of which 17 are non-zero
## 
## Loglikelihood =   -16.29748 
## L1 penalty =  8.274042   at lambda1 =  0.6191571
coefficients(CVFused$fullfit)
##  [1] -0.38037292  0.25834966 -0.94496658 -0.08765365 -0.24216362
##  [6] -0.13296574  1.61915515 -1.59341590 -0.46588088  0.11136052
## [11]  0.74964429  0.62202227 -1.48478948 -1.40514383  0.40676035
## [16] -0.79970682 -2.05904590

The model choses 11 coefficient as positive among 96 of them, which means only 11 feature is selected to predict the response. From the coefficients, we can see that some of the feautres play more important role than others, explaining the response.

Using the model, the accuracy is calculated.

Predictions[,list(Accu=mean(Pred==Real))]
##    Accu
## 1: 0.83

Accuracy value seems decent with 82%. Then, I used the model on the test data to predict test classes.

The accuracy of the model is 81%, which is quite good compared to the training data.

TestPredictions[,list(Accu=mean(Pred==Real))]
##    Accu
## 1: 0.81

Then I created the Confusion Matrix, which can be seen below:

ConfusionMatrix3
##             Actual 0 Actual 1
## Predicted 0       27        9
## Predicted 1       10       54

Then, difference of the training data is taken and the same procedures are applied on the new data. Again, 10 fold cross calidation is used and corresponding lambda and coefficients are calculated.

CVFused2$fullfit
## Penalized logistic regression object
## 95 regression coefficients of which 15 are non-zero
## 
## Loglikelihood =   -29.55598 
## L1 penalty =  12.2022    at lambda1 =  1.280988
coefficients(CVFused2$fullfit)
##  [1]  1.04065822 -0.60355553  0.44621818  0.38466214  0.85067811
##  [6]  0.08177601  0.41162185 -0.74077949 -0.20876553  0.15163962
## [11]  1.31917864  0.70751144 -1.59741629 -0.61495544 -0.36620019

In this method, there are 17 non-zero coefficient. Compared to the first method, this one uses more variables to explain the response. Again, some coefficient values helps explain the data more compared to the others.

Accuracy of the differenced training data can be seen below:

Predictions2[,list(Accu=mean(Pred==Real))]
##    Accu
## 1: 0.76

Then, difference of the test data is calculated and the accuracy is found accordingly.

TestPredictions2[,list(Accu=mean(Pred==Real))]
##    Accu
## 1: 0.81

Test class accuracy seems larger than the accuracy of the accuracy of the training set. I think this happened by luck, since normally training accuracies are higher than test accuracies.

Confusion matrix for the test results of differenced dataset:

ConfusionMatrix4
##             Actual 0 Actual 1
## Predicted 0       24       12
## Predicted 1        7       57